Learning Context for Text Categorization
نویسندگان
چکیده
This paper describes our work which is based on discovering context for text document categorization. The document categorization approach is derived from a combination of a learning paradigm known as relation extraction and an technique known as context discovery. We demonstrate the effectiveness of our categorization approach using reuters 21578 dataset and synthetic real world data from sports domain. Our experimental results indicate that the learned context greatly improves the categorization performance as compared to traditional categorization approaches.
منابع مشابه
Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملFeature Selection for Effective Text Classification using Semantic Information
Text categorization is the task of assigning text or documents into pre-specified classes or categories. For an improved classification of documents text-based learning needs to understand the context, like humans can decide the relevance of a text through the context associated with it, thus it is required to incorporate the context information with the text in machine learning for better clas...
متن کاملContext-sensitive Learning Methods for Text Categorization
Two recently implemented machine learning algorithms, RIPPER and sleeping experts , are evaluated on a number of large text categorization problems. These algorithms both construct classiiers that allow the \context" of a word w to aaect how (or even whether) the presence or absence of w will contribute to a classiication. However , RIPPER and sleeping experts diier radically in many other resp...
متن کاملA framework for text categorization
The field of automatic Text Categorization (TC) concerns the creation of categorizer functions, usually involving Machine Learning techniques, to assign labels from a pre-defined set of categories to documents based on the documents’ content. Because of the many variations on how this can be achieved and the diversity of applications in which it can be employed, creating specific TC application...
متن کاملSampling Strategies and Learning Efficiency in Text Categorization
This paper studies training set sampling strategies in the context of statistical learning for text categorization. It is argued sampling strategies favoring common categories is superior to uniform coverage or mistake-driven approaches, if performance is measured by globally assessed precision and recall. The hypothesis is empirically validated by examining the performance of a nearest neighbo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1112.2031 شماره
صفحات -
تاریخ انتشار 2011